Testing a generalized leaf mass estimation method for diverse tree species and climates of the continental United States

Abstract Estimating tree leaf biomass can be challenging in applications where predictions for multiple tree species is required. This is especially evident where there is limited or no data available for some of the species of interest. Here we use an extensive national database of observations (61 species, 3628 trees) and formulate models of varying complexity, ranging from a simple model with diameter at breast height (DBH) as the only predictor to more complex models with up to 8 predictors (DBH, leaf longevity, live crown ratio, wood specific gravity, shade tolerance, mean annual temperature, and mean annual precipitation), to estimate tree leaf biomass for any species across the continental United States. The most complex with all eight predictors was the best and explained 74%–86% of the variation in leaf mass. Consideration was given to the difficulty of measuring all of these predictor variables for model application, but many are easily obtained or already widely collected. Because most of the model variables are independent of species and key species‐level variables are available from published values, our results show that leaf biomass can be estimated for new species not included in the data used to fit the model. The latter assertion was evaluated using a novel “leave‐one‐species‐out” cross‐validation approach, which showed that our chosen model performs similarly for species used to calibrate the model, as well as those not used to develop it. The models exhibited a strong bias toward overestimation for a relatively small subset of the trees. Despite these limitations, the models presented here can provide leaf biomass estimates for multiple species over large spatial scales and can be applied to new species or species with limited leaf biomass data available.


INTRODUCTION
Forests play an important role in the function of global ecosystems and climate (Bonan, 2008), serving as important global CO 2 sinks (Luyssaert et al., 2008;Wellbrock et al., 2017). A key functional attribute of forests are the leaves themselves, which integrate many ecosystem processes (Atkinson et al., 2010;Wright et al., 2004). For example, leaves are a key component of the global carbon cycle, as more than one-third of atmospheric CO 2 passes through leaf stomata and about half of that is fixed through photosynthesis annually (Sitch et al., 2003). Forest canopies have important local effects on climate by modulating the land-atmosphere fluxes of energy and water (Alkama & Cescatti, 2016). Leaf litter from forests plays an important role in nutrient cycling (Cornwell et al., 2008) and many other ecosystem processes (Chapin et al., 2011). Thus, there are many reasons to improve the quality and quantity of information describing the foliage component of forest ecosystems.
Leaf mass can provide information on plant investment in leaf tissue, which has been shown to correlate with woody plant functional types (Duursma & Falster, 2016). Leaf mass is also related to other plant dimensions of interest, such as specific leaf area, which is correlated with the Huber value (xylem sapwood area/ leaf area) (Mencuccini, Rosas, et al., 2019), a parameter that can be used in modeling plant water use (Mencuccini, Manzoni, et al., 2019). Leaf mass and leaf-mass-based traits have been shown to be critically important, integrative measures of leaf performance and function (Atkinson et al., 2010;Reich et al., 1997), though these traits vary over multiple orders of magnitude across a worldwide spectrum of species and ecosystems (Wright et al., 2004).
The great variation in leaf mass between trees and across forest ecosystems is echoed in calls to develop better estimators of tree leaf biomass to improve forest ecosystem inventories, especially at large spatial scales (Clough et al., 2016;Weiskittel et al., 2015). However, recent approaches to creating generalized tree mass models, that is, "global" allometric models, have focused largely on total tree mass (Chave et al., 2014;Jucker et al., 2017). Though some remote sensing approaches, like terrestrial laser scanning (TLS), are promising to improve biomass estimation (Stovall et al., 2018), estimating foliage biomass with TLS or other methods poses challenges (Stovall et al., 2017).
One of the biggest challenges in improving leaf mass estimation at large spatial scales has been acquiring sufficient data to calibrate species-specific models (Clough et al., 2016;Poudel & Temesgen, 2016). This problem has been addressed by combining species into broad taxonomic groups (Chojnacky et al., 2013;Jenkins et al., 2003), but these groupings ignore basic differences between species within taxonomic groups (Clough et al., 2016). A more promising approach might be to group species by leaf functional attributes that relate to their life-history (Reich et al., 2014;Wright et al., 2004).
Building on the idea of a spectrum of leaf traits (Reich et al., 1997;Wright et al., 2004), Dettmann and MacFarlane (2019) developed a "trans-species" leaf biomass estimation model using two numerically quantifiable leaf traits (shade tolerance and leaf longevity) to represent any species on a numeric scale; however, they found that tree size, vigor, and local competitive environment, not species traits, were the dominant predictors of tree leaf mass. The resultant general leaf mass estimation model gave very good estimates of leaf mass for trees of 17 different tree species (11 hardwoods and 6 softwoods), selected to maximize life history trait differences between species at 21 locations across the US state of Michigan. To show that this model is actually "transferable" across species (sensu Wenger & Olden, 2012), this approach needs further testing over a wider range of tree species, forest ecosystems, and climates for the purpose of assessing general behavior and robustness. For the purposes of this paper, we use the term "trans-species" from Dettmann and MacFarlane (2019) to refer to the transferability of the model across species.
In this study, we used an extensive nationally derived data set to examine the generality of leaf mass estimation relationships identified by Dettmann and MacFarlane (2019) over diverse tree species, regions, and climates of the United States and to determine the feasibility of its use to estimate leaf biomass for the USDA Forest Service Forest Inventory and Analysis (FIA) program, which conducts the national forest inventory (NFI) and the associated parts of the US greenhouse gas inventory. The new study includes additional climate data, as well as variables from the original model of Dettmann and MacFarlane (2019), and gives full consideration to the feasibility of acquiring the data necessary to test and apply the model over a nation as large and geographically diverse as the US NFI. The specific objectives of this study were to (1) compile and summarize available national tree foliage biomass data for the United States, (2) fit and evaluate trans-species models of varying complexity, (3) examine developed relationships across a variety of contrasting factors, and (4) assess model robustness and species transferability using a novel "leaveone-species-out" cross-validation approach.

Study area and data
The leaf biomass data collected in Michigan to generate the original model by Dettmann and MacFarlane (2019) were part of a larger study conducted by seven US universities in partnership with FIA from 2011 to 2021 to improve the US forest biomass inventory  using detailed measurements of tree aboveground biomass components, including foliage mass. Sampling of field data was focused on species that make up roughly two thirds of the growing stock in the United States and to fill gaps in existing data (Frank et al., 2019). These newly collected data were combined with existing individual tree biomass observations obtained from destructive sampling records from past North American studies spanning 1960 to the present (Baldwin, 1989;Baldwin & Saucier, 1983;Clark, 1977;Clark et al., , 1986aClark et al., , 1986bClark III & Schroeder, 1985;Lohrey, 1984;McNab & Clark, 1982;Mroz et al., 1985;Phillips & McNab, 1982;Radtke et al., 2015). These historical tree biomass studies from across the nation are now available online (http://www.legacytreedata.org/). When collating such a large data set, care was taken to examine metadata so as not to include trees that were not fully leafed out. Notably, not every one of these studies followed the same exact protocol, and as such, differences in data may be a source of statistical noise, error, and outliers (discussed later).
Species represented in the combined data set covered a wide range of shade tolerance, leaf longevity, stem diameter at breast height (DBH) (in centimeters, 1.37 m above ground) (Table 1). Study sites were located across the United States covering a range of climatic conditions and including 61 different tree species (Figure 1). Collectively, the data set represents a diverse set of tree species and a wide variety of life-history traits (Table 2), growing over a wide range of ecological and climatic regions (Figure 1), providing a robust data set to test the broader

Tree sample data selection
We selected trees from the combined database having measured leaf mass (M l ); stem DBH; wood specific gravity (SG), which is the "basic" density (g/cm 3 ) of the wood on an oven-dried weight and a green volume basis divided by the density of water; live crown ratio (LCR), which is the length of live crown divided by the total height of the tree, and crown class (CC). CC was assigned to each tree based on the position of the tree in the canopy-(1) overtopped,  Zanne et al. (2014). The latter was to determine whether a published value could be substituted for measured values of SG calculated from samples taken at breast height (as suggested by MacFarlane, 2015), which would be impractical to obtain for every tree in the US NFI. Trees were assigned species-specific leaf longevity (LL) values based on published averages from the GLOPNET database (Wright et al., 2004), available from the TRY plant database (Kattge et al., 2011), as well as published values from Hallik et al. (2009), Niinemets andLukjanova (2003), Pinchot (1907), Balster and Marshall (2000), Gower et al. (1989), and Reich et al. (1999). Trees were assigned species-specific shade tolerance (ST) ratings on a scale of 1 (shade intolerant) to 5 (shade tolerant) using published values from Niinemets and Valladares (2006).
The geographic coordinates of the sample locations were used in conjunction with 2.5-min ($50 km 2 ) resolution WorldClim global average monthly climate data from 1970 to 2000 (Fick & Hijmans, 2017) to assign mean annual temperature (MAT) ( C) and mean annual precipitation (MAP) (mm) to each tree. These variables were used to represent biological (DBH, LCR, CC), species-F I G U R E 1 Sample site locations within continental United States of all 3628 sample trees used.
T A B L E 2 Summary of tree attributes and sample sizes by species; means and ranges provided with SD in parentheses

Model formulation and variable selection
The initial model followed a form similar to that recommended by Dettmann and MacFarlane (2019). However, due to insufficient data on tree growth rates and neighboring trees for many of the trees in this study, the variables of basal area increment and competition index from the Dettmann and MacFarlane (2019) model were not included in the model. Instead, MAP (mm) and MAT ( C) were added to account for gross variation in climate across sites, resulting in the following model:  (1) were log transformed because the relationships between leaf mass and predictors were generally observed to be heteroscedastic, and this allowed for nonlinear trends to be captured, but with the benefits of a linear modeling framework (Appendix S1: Figure S1). To achieve the log transformation with MAT whose values could be negative, MAT was translated by adding 30 to each value; 30 was selected for the translation so as to ensure that a positive result was obtained for even the minimum MAT value.
An alternative to Equation (1) included published SG (SG p ), rather than measured SG, to address the concern that it would be very unlikely that SG would be measured on every tree as part of any NFI: Equations (1) and (2) were fit to the data set using backwards stepwise regression using the package MASS (Venables & Ripley, 2002) in the R (R Core Team, 2019) environment to find the optimum model to estimate the dry mass of leaves. Though we were seeking the optimum model, we also retained simpler models-reduced versions of Equation (1) or (2)-as potential candidates, because we wanted to see how much predictive power would be lost by dropping certain variables, especially those that would be more costly to measure or acquire, since an important part of this research was to create a model that could be broadly implemented for the US NFI. Four more models were selected to represent models of varying complexity and differing data collection limitations: Collinearity of the variables included in the models was evaluated using a variable inflation factor (VIF). No VIF in the fully specified model Equation (2) was over 5, so no major concerns of multicollinearity between our predictor variables was determined (Appendix S1: Table S1).

Model comparisons, validation, and assessment
Four types of error metric approaches were used to compare models to each other, with mean percentage error (MPE), mean absolute error (MAE), mean absolute percentage error (MAPE), and mean arctangent absolute percentage error (MAAPE) calculated for all models fitted as follows: where n is the total number of trees, a t the observed foliage mass, and p t the estimated foliage mass from the model for tree t. These error metrics were calculated using the nontransformed observed values and on backtransformed estimated values using the correction factor given by Baskerville (1972). The first validation method used was a 10-fold cross validation. The second method of validation was done using a k-fold cross validation where one species was removed from data at a time and the model was refitted; we called this "leave-one-species-out" cross validation. The leave-one-species-out cross validation was done to compare how each model might perform on species where no data were observed. Models were then compared to one another using MPE, MAE, MAPE, MAAPE, their adjusted R 2 , and Akaike's information criterion (AIC).
To further understand the relative contributions of predictors to model accuracy, we calculated the relative importance of each variable using dominance analysis (Budescu, 1993;Johnson & Lebreton, 2004). This gave the proportion of the model-explained variance into nonnegative contributions from each variable based on sequential R 2 values, but it accounts for the dependence of the order in which the variables are added by averaging over orderings (Grömping, 2006).

Variables influencing leaf mass
Interaction terms were initially considered, and multiple interactions were found to be significant (at the 0.05 level). Despite this, these terms added very little in explanatory power (adjusted R 2 increase of 0.02). Owing to the lack of value added in terms of model fit and parsimony, interaction terms were removed from further consideration.
The coefficients for each of the models explored are shown in Table 3. All initial variables from both Equations (1) and (2) were retained when subjected to backward stepwise regression, using AIC as the criterion, indicating all model variables contributed significantly to leaf biomass. Figure 2 shows the relative importance of each variable in the model containing DBH, LCR, ST, LL, CC, SG p , MAT, and MAP in Equation (2). The most important predictor was tree size (DBH), with a relative importance of 64.39% overall, followed by a tree's CC at 22.39%. The next most important variables were LL (5.91%) and LCR (4.83%). The remaining variables of SG p , ST, MAT, and MAP each held much lower relative importance, with values of 0.61%, 0.67%, 0.91%, and 0.30%, respectively Figure (2).
As expected, larger trees (DBH) with longer crowns (LCR) had more foliage mass. Specific gravity also had a positive relationship with total leaf mass. Both of the life history traits included (ST and LL) showed a positive relationship with foliage mass, indicating that trees with longer lived leaves and increased tolerance to shade maintained greater leaf masses, holding other factors constant. When included, MAT also had a significant and positive relationship with leaf mass. MAP, on the other hand, was negatively related to leaf biomass, when all other factors were included. Thus, holding other factors constant, trees in warmer, drier climates maintain a greater leaf mass. Table 4 shows fit statistics for all the models explored. Equations (1) and (2) were the best models and very similar statistically, though the model fitted with SG p performed slightly better with regard to AIC (partial dependency plots for Equation 2 can be found in Appendix S1: Figure S2). The remaining models showed increasingly poorer performance in this order: Equation (6) > Equation (5) > Equation (4) > Equation (3). For all models (Table 4), the leave-one-species-out cross validation showed relatively small differences between MPE, MAE, MAPE, and MAAPE from that derived from the 10-fold cross-validation approach, indicating a general tendency of the models to work as well for species not included during model development ( Table 5).
All the models showed a mean tendency to overestimate leaf mass ranging from $30% to $67% MPE across all models in the 10-fold cross validation. However, the distribution of MPEs (Figure 3) showed that the models estimated leaf mass well for most of the population of trees (tall peaks near 0% MPE) (Figure 3), with very large overestimation errors for a relatively small subset of trees (right-skewed distributions) (Figure 3), which heavily skewed the means. This skewness meant that the median prediction error was much lower than the mean (compare Figure 3 to Table 4), which was noticeably greater for Equations (3), (4), and (5) than for Equations (1), (2), and (6). In terms of absolute mass, the MAE from 10-fold cross validation ranged from 4.60 to 5.92 kg compared to a mean leaf mass of 11.97 kg, indicating an approximate relative error range of around AE37%-50% of tree leaf mass, whereas the median MPE ranged between 9% and 19% ( Figure 3).

Influence of tree size and crown shading on leaf mass estimation
Of the variables that we studied, DBH was the most important for predicting a tree's leaf mass. This was expected given well-documented allometric scaling relationships between DBH and other components of tree mass, including foliage (Chojnacky et al., 2013;Clough et al., 2016;Wirth et al., 2004). The relative importance of DBH proved to be the largest of any predictor examined (64.39%) but yielded a percentage error and bias in prediction of foliage mass roughly double those of the best models tested ( Table 4).
The next most important variable was CC (22.39%), suggesting that the position of a tree in the canopy in relation to others plays an important role in determining its leaf mass. Such a finding corroborates the findings from a study by Le Goff et al. (2004), where different measures of competition were correlated with leaf mass. Here, our categorical variable of CC appeared to encompass some of the effects of how trees are able to alter the resources available to neighbors (Palik et al., 1997). A tree's position in the canopy reflects resources available to it (Canham et al., 2004), and it' in addition to crown shape (MacFarlane et al., 2003;Muth & Bazzaz, 2003). Competition experienced can also alter expressed leaf traits such as leaf mass per area due to light availability (Ellsworth & Reich, 1993). LCR was another important factor for estimating leaf mass, and other studies have confirmed this finding (Loomis et al., 1966;Temesgen et al., 2011). LCR influences the scaling relationship between live woody mass (i.e., sapwood) and leaf mass in trees (Mäkelä & Valentine, 2006) and can serve as a proxy for vigor. This may explain why it was relatively more important in this study, where tree vigor was not measured directly, than was the study by Dettmann and MacFarlane (2019), where tree vigor (most recent 5-and 10-year basal area growth). Likewise, LCR may be a proxy for competition experienced by a tree because it integrates T A B L E 3 Model coefficient and associated standard error in parentheses. Models were fitted using all 3901 tree samples. Each model contains selected variations of complexity with model Equations (1) and (2) being the result of backward stepwise regression on all variables, with field-measured or published values for specific gravity, respectively. Crown class serves as a variable intercept for all models but Equation (3).

Relative importance
F I G U R E 2 Relative importance of variables for prediction of leaf mass rounded to nearest 0.01% of ordinary least squares model with diameter at breast height (DBH), specific gravity as given by Chave et al. (2009) (SG p ), mean annual temperature (MAT), mean annual precipitation (MAP), crown classification (CC), leaf longevity (LL), shade tolerance (ST), and uncompacted live crown ratio (LCR). The relative importance gives the proportion of model explained variance into nonnegative contributions from each variable based on sequential R 2 but accounts for the dependence on the order the variables were added by averaging over all possible orderings (Grömping, 2006).
T A B L E 5 Cross-validation results from leave-one-species-out cross validation of mean error, mean percentage error (MPE), and mean absolute percentage error (MAPE) for model Equation (2)  the effect of crowding on the reduction of leaf mass via the reduction of LCR (Antos et al., 2010). A major problem in the use of LCR for the US NFI would be the fact that the FIA program records a "compacted" crown ratio (CLCR) on all FIA plots (US Forest Service, 2018) but only records the LCR on a subset. The difference between LCR and CLCR can be as much as 17% and would require a model to translate CLCR to LCR (Toney & Reeves, 2009). Attempts to do so have shown significant bias and poor performance for trees of different types (Randolph, 2010); however, equations to convert CLCR are available across all four FIA regions, which could provide nationwide coverage of LCR . Little additional effort is required to record LCR along with CLCR, and, considering the subjective nature of CLCR, there have been calls to have LCR measured on all plots in the US NFI (Monleon et al., 2004;Pollard et al., 2006).

Species-specific variables and the idea of a trans-species model
The generalized leaf mass model is based on the idea of a transferrable model, where the model could work for any tree of any species for which LL, ST, and SG can be defined (Dettmann & MacFarlane, 2019). The recommended approach requires trees to be assigned values for LL, ST, and SG p based on published species values, so species is present in the model, at least implicitly. One of the values of this type of trans-species approach is that it allows prediction for a new species, for which there were no data or insufficient data to use for model development.
Together, the combined relative importance of LL and ST was small ($6.5% combined) in comparison to some other variables included, demonstrating that the large intraspecific variation in leaf mass allocation is dependent on other factors, such as size or competitive environment (Table 4 and Figure 3). The relative lower importance of LL and ST may be in part due to their error in attribution to each individual tree. These published values are only species averages with their own uncertainty, since LL and ST may vary across populations and life stages. Including these two variables, however, did reduce MPE and MAPE by approximately 20% and 25%, respectively, in both 10-fold cross validation and leave-one-species-out cross validation (Table 4). Thus, inclusion of LL and ST in the trans-species model provided important information for differentiating among trees similar in size and competitive environment. Further, adding these traits helped to reduce overestimation bias (Figure 3), which may have been caused by model coefficients being more heavily weighted toward some species than others owing to an unequal representation of some species in the data ( Table 2). LL held a relative importance of 5.91% and served to place all trees on a meaningful continuous scale, directly related to leaf function. Here, trees with a greater leaf longevity show a tendency to maintain a greater leaf mass, consistent with at least one other study (Reich et al., 1992), likely due to the strong positive relationship between LL and leaf mass per area (LMA) (Osnas et al., 2013). Additionally, leaves with a longer LL must maintain their leaves longer to achieve a positive carbon balance (Reich, 1987), so leaf mass naturally accumulates for species with longer-lived leaves.
Shade tolerance showed a lower relative importance (0.67%) compared to LL, with more shade-tolerant trees having significantly more leaf mass. Callaway et al. (2000) also found that shade-tolerant trees held a greater leaf mass. One explanation for this is that shade-tolerant trees are able to maintain additional leaves in lower light environments, such as on lower branches, deeper within their crowns (Valladares & Niinemets, 2008). This leads shade-tolerant trees to have deeper crowns (Canham et al., 1994), with efficient leaf display in lower canopy positions (Abrams & Kubiske, 1990;Canham, 1988). With many more species included in this trans-species leaf mass model than in the original model by Dettmann and MacFarlane (2019;61 vs. 17), LCR became a relatively more important predictor than ST; this suggests a fairly high degree of variation in LCR at any given level of ST, making LCR a more reliable predictor of leaf mass across many species. Similarly, the higher relative importance of CC than ST highlights how expression of shade tolerance is likely affected by an individual tree's light environment.
We found that published species-specific values for SG were about as good a predictor as measured values, likely because they captured the mean trend of species-specific, leaf mass-wood density relationships (Chave et al., 2009). Trees with higher SG (and SG p ) tended to carry more foliage when accounting for other variables, though SG p had a relative importance of only about 0.61%. Forrester et al. (2017) also found a positive relationship between foliage mass and wood density but noted a high degree of variability within and between species. This variation of wood specific gravity within (Lei et al., 1996) and between species (Chave et al., 2006) can be seen with measured values of SG (Table 2), while SG p only represents the mean variation between species. Trees with denser wood have been noted to tolerate competition to a greater degree (Kunstler et al., 2015), so SG may be confounded with other variables in the model, such as CC.

Climatic influences
Our new model captures general trends of how climate affects tree leaf mass accumulation, where leaf mass increases with increasing MAT and decreases with increasing MAP. Climatic variation was shown in other studies to correlate with leaf traits worldwide, with LMA being positively correlated with MAT and increasing as rainfall decreases (Wright et al., 2004). Here, both climatic variables showed a relatively low importance as predictors (0.91% and 0.30%, for MAT and MAP, respectively) compared to other variables. Thus, our results suggest that, for trees at different locations, experiencing different climatic conditions, the size and local competitive environment (expressed in CC) experienced by a tree are the dominant forces determining its leaf mass. Nonetheless, that should not be interpreted to indicate that climate does not have an important influence on tree leaf mass. Since trees are adapted to their local environments and exhibit plastic growth, it is likely that the other variables and functional traits related to life history already incorporate climate effect. Including climate variables was a significant improvement over measuring only the metrics of tree size DBH and LCR, so climate variables likely capture some general information about species, since even common species tend to specialize in certain climatic regions.

Explanations for overestimation bias
The models showed a skewness toward overestimation bias, which bears greater discussion (Figure 3). We expect that this trend reflects both the variables selected and the underlying data. In terms of the variables selected, we can see that models which rely heavily on the size of the tree (here DBH and LCR) tended to produce the greatest overestimation bias (Equations 3 and 4, Figure 3). We assume that most of the trees sampled were relatively healthy when destructively sampled for foliage mass estimation, having a "full" complement of leaves for their size, but that some of the trees sampled were likely less vigorous, which would cause the model to be more likely to overestimate leaf mass for a tree based only on its size. We can see that the model with DBH alone had the largest skewness and adding LCR and CC, which might index competitive stress, reduced it somewhat (Figure 3).
Since we drew data from a large number of historical studies, we could not confirm that every study only selected "healthy" trees for its sampling. Further, our author group's extensive experience with tree biomass estimation from destructive sampling indicates that one is likely to underestimate biomass components because measurement errors are most likely to be omissions of biomass. For example, when a tree is felled, sampled too early in the growing season before leaves have time to fully initiate (Dettmann, 2017), or too late in the season when leaf mass may have been lost to senescence, leaves and needles may be lost. During examinations of the data, we removed some obvious underestimation outliers, but only from data the authors collected themselves; it was difficult to determine such outliers from historical data sets. Finally, alternative sampling methods have been used for determining foliage biomass, which can have a significant influence on the overall value (e.g., Temesgen et al., 2011). Nonetheless, skewness toward overestimation was reduced by adding more species-specific functional traits to the model (cf. Equations 1, 2, and 6 to 3, 4, and 5 in Figure 3), suggesting the source of some of the estimation errors could be a result of relatively small sample sizes or unique leaf traits.
Some of the largest errors for species were found for Taxodium distichum (bald cypress), Carya glabra (pignut hickory), Ilex opaca (American holly), and Magnolia virginiana (sweetbay magnolia). Though we could not explain the exact sources of error for each species, I. opaca was the only broad-leaved, evergreen species within our database, and there was only one sample from it. T. distichum was one of few deciduous coniferous species within our study, with only 12 samples. Additionally, MPE and MAPE, as relative error metrics, are often sensitive to even small prediction errors in small trees. Measures such as these can be inflated and skewed when many small trees are included in the sample. This pattern can be seen in the small sample specimens of T. distichum, which had mean errors >200% but leaf mass only averaging 2.4 kg. Thus, we may have lacked the necessary samples to fully describe species that lie outside of the norm of the most common deciduous broad-leaved or evergreen, needle-leaved tree species and overestimating for a subset of small trees, where published values of LL and ST are for mature trees. Another limitation revealed during our "leave-one-species-out" cross validation was that there are many species with very few samples, where there is either only one individual or potentially only one site sampled. These were left in our analysis, however, because we were specifically interested in the transferability of the model and how it might perform for an undersampled or rare species.
Another potential source of error could be from the combination of so many data sources. Some variation could have crept in from variation in sampling methods, measurement error, or trees sampled when senescing or in decline. We conducted a simple cross validation on Model 2, leaving one study (author) out, which identified some studies that may be contributing more to the overall error (Appendix S1: Table S2). But disentangling each study from other confounding factors, such as species, geography, sampling method, and site variations, would be extremely difficult, if not impossible.

Conclusion and recommendations for application
Overall, the trans-species leaf mass model performed well for the vast majority of a wide variety of tree species across the large ecological domains of the United States. Our overall recommendation, given the results, would be to use Equation (2) as the best generalized foliage mass equation for the continental United States. Most of the tree-level variables are already measured on all the NFI plots for the United States, with the exception of LCR (see Discussion section). The three species functional traits, ST, LL, and SG p , are available for a large number of species. Of 443 "forestgrowing" tree species listed in the FIA database (the database now includes many urban tree species), we were able to find ST values for 237 of them (about 53%) from Niinemets and Valladares (2006) and for all the tree species for this study, which included the most common or well-studied species (most of the species in the Legacy tree data base).
Leaf longevity values specific to US tree species are relatively less abundant in the GLOPNET database (Wright et al., 2004), but we were able to find values for all the species in this study from additional sources (see Methods). This highlights the need to fill knowledge gaps related to ST and LL, which are unknown for many species found in large databases, such as the US NFI. In the case of LLs, it might be reasonable to assume similar values for congeneric species. It might also be possible to assign species reasonable ST values using qualitative descriptors found in textbooks on dendrology or silvicultural manuals, but it would be more valuable to expand the seminal methodology of Niinemets and Valladares (2006) to try to quantify ST for many more species.
Of the 443 forest tree species in the FIA database, all of them had published values for SG p . To use (measured) SG for biomass prediction on standing trees, cores would need to be either sampled from each individual tree or extracted from sample trees to obtain SG estimates for the population (Williamson & Wiemann, 2011). This could prove time consuming and expensive, and, given the small difference in predictive power between SG and SG p , it would appear that SG p is sufficient. Similarly, Chave et al. (2005) recommended using SG p for whole-tree mass estimation models. However, MacFarlane (2015) found that measured SG could provide lower model error when used instead of SG p in a formal assessment. In the latter case, the focus was on the woody components of total aboveground biomass, whereas this study is the first to test the difference between using SG p and SG for predicting leaf mass.
Our overall conclusion is that the generalized model for leaf mass estimation we developed here captures the most important factors affecting leaf biomass variation between trees of the diverse tree species, with diverse life-history traits, growing in a variety of climates in the United States and potentially in other parts of the world where information is available, such as Canada. As such, it appears that the model could be implemented to predict leaf biomass for approximately 227 of the 443 tree species listed in the FIA database, which have published values for ST and SG and for which approximate LL values can likely be obtained, if not already available. In the event that this is not sufficient, we also provided reduced versions of the trans-species leaf mass model, which could also be used. There could also be a case for developing species-specific models for species of direct interest for which there are large amounts of data and for which the data cover a significant portion of the species' spatial range. Yet for the case of species with little or no data on foliage mass, our model would be quite useful. In general, given the findings of this study, the transspecies modeling approach deserves extension to other species, countries, and biomass components.

ACKNOWLEDGMENTS
Special thanks to David Walker for his contributions in compiling the legacy data. Funding for this research was provided through the USDA Forest Service Forest Inventory and Analysis Program. Part of David W. MacFarlane's time was supported with funds from the USDA National Institute of Food and Agriculture.

CONFLICT OF INTEREST
The authors declare no potential conflict of interest related to this manuscript.